1,154 research outputs found
Fitting multiplicative models by robust alternating regressions.
In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R-2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.Alternating regression; Approximation; Biplot; Covariance; Dispersion matrices; Effects; Estimator; Exploratory data analysis; Factor analysis; Factors; FANOVA; Least-squares; Matrix; Median polish; Model; Models; Outliers; Principal components; Robustness; Structure; Two-way table; Variables; Yield;
Outlier Detection Using Nonconvex Penalized Regression
This paper studies the outlier detection problem from the point of view of
penalized regressions. Our regression model adds one mean shift parameter for
each of the data points. We then apply a regularization favoring a sparse
vector of mean shift parameters. The usual penalty yields a convex
criterion, but we find that it fails to deliver a robust estimator. The
penalty corresponds to soft thresholding. We introduce a thresholding (denoted
by ) based iterative procedure for outlier detection (-IPOD). A
version based on hard thresholding correctly identifies outliers on some hard
test problems. We find that -IPOD is much faster than iteratively
reweighted least squares for large data because each iteration costs at most
(and sometimes much less) avoiding an least squares estimate.
We describe the connection between -IPOD and -estimators. Our
proposed method has one tuning parameter with which to both identify outliers
and estimate regression coefficients. A data-dependent choice can be made based
on BIC. The tuned -IPOD shows outstanding performance in identifying
outliers in various situations in comparison to other existing approaches. This
methodology extends to high-dimensional modeling with , if both the
coefficient vector and the outlier pattern are sparse
Predicting deadline transgressions using event logs
Effective risk management is crucial for any organisation. One of its key steps is risk identification, but few tools exist to support this process. Here we present a method for the automatic discovery of a particular type of process-related risk, the danger of deadline transgressions or overruns, based on the analysis of event logs. We define a set of time-related process risk indicators, i.e., patterns observable in event logs that highlight the likelihood of an overrun, and then show how instances of these patterns can be identified automatically using statistical principles. To demonstrate its feasibility, the approach has been implemented as a plug-in module to the process mining framework ProM and tested using an event log from a Dutch financial institution
Exploring Outliers in Crowdsourced Ranking for QoE
Outlier detection is a crucial part of robust evaluation for crowdsourceable
assessment of Quality of Experience (QoE) and has attracted much attention in
recent years. In this paper, we propose some simple and fast algorithms for
outlier detection and robust QoE evaluation based on the nonconvex optimization
principle. Several iterative procedures are designed with or without knowing
the number of outliers in samples. Theoretical analysis is given to show that
such procedures can reach statistically good estimates under mild conditions.
Finally, experimental results with simulated and real-world crowdsourcing
datasets show that the proposed algorithms could produce similar performance to
Huber-LASSO approach in robust ranking, yet with nearly 8 or 90 times speed-up,
without or with a prior knowledge on the sparsity size of outliers,
respectively. Therefore the proposed methodology provides us a set of helpful
tools for robust QoE evaluation with crowdsourcing data.Comment: accepted by ACM Multimedia 2017 (Oral presentation). arXiv admin
note: text overlap with arXiv:1407.763
A COMPARISON OF METHODS FOR SELECTING PREFERRED SOLUTIONS IN MULTIOBJECTIVE DECISION MAKING
ISBN : 978-94-91216-77-0In multiobjective optimization problems, the identified Pareto Frontiers and Sets often contain too many solutions, which make it difficult for the decision maker to select a preferred alternative. To facilitate the selection task, decision making support tools can be used in different instances of the multiobjective optimization search to introduce preferences on the objectives or to give a condensed representation of the solutions on the Pareto Frontier, so as to offer to the decision maker a manageable picture of the solution alternatives. This paper presents a comparison of some a priori and a posteriori decision making support methods, aimed at aiding the decision maker in the selection of the preferred solutions. The considered methods are compared with respect to their application to a case study concerning the optimization of the test intervals of the components of a safety system of a nuclear power plant. The engine for the multiobjective optimization search is based on genetic algorithms
Robust high-dimensional precision matrix estimation
The dependency structure of multivariate data can be analyzed using the
covariance matrix . In many fields the precision matrix
is even more informative. As the sample covariance estimator is singular in
high-dimensions, it cannot be used to obtain a precision matrix estimator. A
popular high-dimensional estimator is the graphical lasso, but it lacks
robustness. We consider the high-dimensional independent contamination model.
Here, even a small percentage of contaminated cells in the data matrix may lead
to a high percentage of contaminated rows. Downweighting entire observations,
which is done by traditional robust procedures, would then results in a loss of
information. In this paper, we formally prove that replacing the sample
covariance matrix in the graphical lasso with an elementwise robust covariance
matrix leads to an elementwise robust, sparse precision matrix estimator
computable in high-dimensions. Examples of such elementwise robust covariance
estimators are given. The final precision matrix estimator is positive
definite, has a high breakdown point under elementwise contamination and can be
computed fast
Gauge fields, ripples and wrinkles in graphene layers
We analyze elastic deformations of graphene sheets which lead to effective
gauge fields acting on the charge carriers. Corrugations in the substrate
induce stresses, which, in turn, can give rise to mechanical instabilities and
the formation of wrinkles. Similar effects may take place in suspended graphene
samples under tension.Comment: contribution to the special issue of Solid State Communications on
graphen
An integrative clustering approach combining particle swarm optimization and formal concept analysis
- …